Show code
library(dplyr)
library(readr)
library(ggplot2)
library(tigris)
library(sf)
library(DT)
library(janitor)
library(tidycensus)
# Cache shapefiles locally (faster repeat runs)
options(tigris_use_cache = TRUE)Caitlin Uang
December 18, 2025
According to the CDC, 32% of adults in the United States consumed fast food on any given day between August 2021 and August 2023, making access to fast food an important factor when examining potential health-related risks. This project analyzes the geographic distribution of fast food restaurants across the United States to answer the question: How many fast food restaurants per capita are there in each state and county?
The motivation for this analysis is rooted in the overarching question: Are food deserts in the US linked to health problems like diabetes? Understanding where fast food is most concentrated provides insight into differences in food access across communities and helps identify regions that have less quality or healthy food options. For further information about this project and to see other analytical questions tied to this investigation, click here.
Using the NaNDA: Eating and Drinking Places by Census Tract & ZCTA, 1990–2021 with the U.S. Census Bureau American Community Survey B01003 (2019), the team analyze the availability of fast food restaurants across the United States and in which state and county can we find the highest density of fast food restaurants to support the team’s overarching question.
To acquire the national fast food count per state and county, this project will use the NaNDA: Eating and Drinking Places by Census Tract & ZCTA, 1990-2021 dataset and filter year 2019. The dataset at census tract level is used. NaNDA is the National Neighborhood Data Archive, providing counts and density of food establishments (restaurants, bars, fast food) from 1990-2021, sourced from the National Establishment Time Series (NETS) database.
Due to ICPSR access restrictions, NaNDA dataset was downloaded manually and created a reproducible import pipeline that verifies file availability and performs all subsequent processing steps.
# Set up directory if it doesn't exist
if (!dir.exists("data")) {
dir.create("data", recursive = TRUE)
}
if (!dir.exists(file.path("data", "finalprojectfastfood"))) {
dir.create(file.path("data", "finalprojectfastfood"), recursive = TRUE)
}
nanda_eatdrink <- file.path(
"data",
"finalprojectfastfood",
"nanda_eatdrink_Tract20_1990-2021_01P.csv"
)
# Checking if file exist in directory
if (!file.exists(nanda_eatdrink)) {
stop(
"NaNDA data file not found.\n",
"Please download 'nanda_eatdrink_Tract20_1990-2021_01P.csv' ",
"from ICPSR Study 208751 and place it in:\n",
normalizePath(file.path("data", "finalprojectfastfood"))
)
}
# Load dataset and use only 2019 fast food count
eatdrink_tract <- read_csv(nanda_eatdrink, show_col_types = FALSE)
fastfood_raw <- eatdrink_tract |>
filter(year == 2019) |>
arrange(desc(count_fastfood)) |>
select(tract_fips20,totpop,count_fastfood)The U.S. Census Bureau’s American Community Survey (ACS) Table B01003 provides estimates of total population for a given geographic area, serving as a foundational demographic measure in this analysis. The ACS B01003 table used here is based on the 2019 five year estimates, which are designed to provide more reliable population figures, particularly for smaller geographic units such as census tracts and counties.
# Load dataset
#pop_2019 <- read_csv("ACSDT5Y2019.B01003-Data.csv")
pop_file <- file.path(
"data",
"finalprojectfastfood",
"ACSDT5Y2019.B01003-Data.csv"
)
if (!file.exists(pop_file)) {
stop(
"ACS data file not found.\n",
"Please download 'ACSDT5Y2019.B01003-Data.csv'",
"from https://data.census.gov/table/ACSDT5Y2019.B01003?q=Population+Total&g=010XX00US$1400000 and place it in:\n",
normalizePath(file.path("data", "finalprojectfastfood"))
)
}
pop_2019<- read_csv(pop_file, show_col_types = FALSE)The NaNDA dataset identifies locations using census tract–level FIPS codes, which are standard numeric identifiers assigned to each census tract in the U.S, while ACS B01003 uses GEOID, which are longer numeric codes to uniquely identify all administrative and statistical geographic areas. GEOIDs encompasses FIPS codes and will be used to join both datasets. Using GEOIDs provides a reliable way to align fast food counts with population figures at the census tract level without losing geographic accuracy.
# Get FIPS from GEOID
pop_clean <- pop_2019 |>
mutate(tract_fips20 = substr(GEO_ID, nchar(GEO_ID) - 10, nchar(GEO_ID)))
# Join ACS and fast food dataset by FIPS code and filter out tract with 0 population
fastfood_tract <- fastfood_raw |>
left_join(pop_clean, by = "tract_fips20") |>
select(tract_fips20,GEO_ID,totpop,B01003_001E,count_fastfood) |>
rename(acs_pop = B01003_001E) |>
mutate(acs_pop = as.numeric(acs_pop)) |>
filter(acs_pop > 0)Let’s take a closer look at the joined dataset. Now that the NaNDA fast food restaurants data and the ACS population data have been successfully combined at the census tract level, we can explore how fast food restaurant counts and population figures align geographically.The table below displays key variables for each census tract, including the tract FIPS code, GEOID, total population, and fast food restaurant count.
fastfood_tract |>
select(
FIPS = tract_fips20,
GEOID = GEO_ID,
`Total Population` = acs_pop,
`Fast Food Count` = count_fastfood
) |>
datatable(
caption = "Fast Food Restaurants and Population by Census Tract",
rownames = FALSE,
options = list(
pageLength = 10,
scrollX = TRUE
)
) |>
formatRound("Total Population", 0, mark = ",") |>
formatRound("Fast Food Count", 0)Looking at summary statistics helps understand the distribution of fast food availability across census tract.
The distribution of fast food restaurants across census tract is highly right skewed as there is a large number of census tracts with no fast food restaurants. As the number of restaurants increase, the number of census tract decreases rapidly.
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 0.000 0.815 1.000 22.000
fastfood_tract |>
ggplot(aes(x = count_fastfood)) +
geom_histogram(
bins = 50,
fill = "#9badff",
color = "white",
linewidth = 0.3
) +
geom_vline(
xintercept = median(fastfood_tract$count_fastfood, na.rm = TRUE),
linetype = "dashed",
linewidth = 0.8,
color = "#0432FF"
) +
labs(
title = "Distribution of Fast Food Count by Census Tract",
caption = "Source: NaNDA Eating and Drinking Places by Census Tract & ZCTA, 1990–2021\nU.S. Census Bureau ACS B01003 (2019)",
x = "Fast Food Count",
y = "# of Census Tracts"
) +
theme(
plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
plot.caption = element_text(size = 10, face = "italic", hjust = 0)
)The distribution of population across census tracts is relatively contained and mildly right skewed between 2,000 and 6,000 people and very few extreme outliers. The median falls at 3,852, representing the population of a typical census tract.
Min. 1st Qu. Median Mean 3rd Qu. Max.
2 2792 3852 3981 5008 38754
fastfood_tract |>
ggplot(aes(x = acs_pop)) +
geom_histogram(
bins = 50,
fill = "#9badff",
color = "white",
linewidth = 0.3
) +
geom_vline(
xintercept = median(fastfood_tract$acs_pop, na.rm = TRUE),
linetype = "dashed",
linewidth = 0.8,
color = "#0432FF"
) +
labs(
title = "Distribution of Population by Census Tract",
caption = "Source: NaNDA Eating and Drinking Places by Census Tract & ZCTA, 1990–2021\nU.S. Census Bureau ACS B01003 (2019)",
x = "Population",
y = "# of Census Tracts"
) +
theme(
plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
plot.caption = element_text(size = 10, face = "italic", hjust = 0)
)Now that we have a better understanding of the distribution of fast food counts and population by census tract, the team shifted focus on analyzing the fast food density, measured by count of fast food restaurants per 1,000 people to have a relative comparison of fast food access across census tracts. During the early days of this project, the team found that looking only at just the count of fast food restaurants is misleading because census tracts vary in the number of people living in them, while fast food restaurants are unevenly distributed. For example, states with larger total population have larger amount of fast food restaurants available. Using fast food density allows for a more meaning comparison as it accounts for population size.
In the next section, we will look at fast food density at the county and state level to examine any patterns that can help us answer the overarching question.
To examine fast food density at the national scale, we first aggregated fast food restaurant counts from the census tract level to the state level. Looking at the data at a state level reduces local variability caused by small population census tracts and also provide a clearer view of regional differences in fast food restaurant availability. The state FIPS code is extracted from each census tract, and fast food restaurant counts and population are summed before calculating the fast food density by dividing the count of fast food restaurants and population then multiplying by 1,000. Next, we used a U.S. State boundary shapefile for 2019 and joined it with the aggregated fast food density to create a choropleth map for visualization.
# Aggregate tract to state
fastfood_state <- fastfood_tract |>
mutate(state_fips20 = substr(tract_fips20, 1, 2)) |>
group_by(state_fips20) |>
summarize(
total_fastfood = sum(count_fastfood, na.rm = TRUE),
total_pop = sum(acs_pop, na.rm = TRUE),
den_fastfood = (total_fastfood / total_pop) * 1000,
.groups = "drop"
) |>
filter(total_pop > 0, !is.na(den_fastfood))
# Load state shapefile and join
state_shapes <- states(year = 2019, cb = TRUE) |>
mutate(STATEFP = as.character(STATEFP))
fastfood_map_state <- state_shapes |>
left_join(fastfood_state, by = c("STATEFP" = "state_fips20"))This choropleth shows a clear regional pattern, especially in the South with a darker shade indicating higher fast food density than the Northeast, Midwest, and West. There is a notable exception in the West. Let’s now look at a datatable with state level fast food density values and its corresponding fast food count and total popultation.
# Create heat map
ggplot(fastfood_map_state) +
geom_sf(aes(fill = den_fastfood), color = "white") +
scale_fill_gradient(
low = "#ffffff",
high = "#932092",
name = "Fast Food\nDensity", limits = c(NA,NA)
)+
coord_sf(
xlim = c(-125, -66),
ylim = c(24, 50),
expand = FALSE
) +
labs(
title = "Fast Food Density by State",
caption = "Source: NaNDA Eating and Drinking Places by Census Tract & ZCTA, 1990–2021\nU.S. Census Bureau ACS B01003 (2019)"
) +
theme_void() +
theme(
legend.position = "right",
plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
plot.margin = margin(5, 5, 5, 5),
plot.caption = element_text(size = 10, face = "italic", hjust = 0)
)These results show a clear trend of Southern states dominating the top rankings with 8 of out 10 states with highest fast food density. Mississippi ranks the highest with 0.30 fast food restaurants per 1,000 people. Wyoming stands out as an outlier outside the South, sitting at rank 3 with a fast food density of 0.28.
This pattern suggests that the South and select western regions face challenges in accessing healthy food and is saturated with convenient fast food options.
# Create datatable
datatable(
fastfood_map_state |>
st_drop_geometry() |>
arrange(desc(den_fastfood)) |>
slice_max(den_fastfood, n = 10) |>
select(
State = NAME,
`Fast Food Density` = den_fastfood,
`Total Population` = total_pop,
`Fast Food Count` = total_fastfood),
rownames = FALSE,
caption = "Top 10 States with the Highest Fast Food Density",
options = list(
searching = FALSE,
scrollX = FALSE,
autoWidth = TRUE
)
) |>
formatRound("Fast Food Density", 2) |>
formatRound("Total Population", 0, mark = ",") |>
formatRound("Fast Food Count", 0)While the state level analysis provides an insightful high level overview of fast food restaurant availability across the United States, it can prevent us from seeing substantial variation within states. Larger and more diverse states may have both urban counties with relatively low fast food density and rural counties with disproportionately higher fast food density when adjusted for population size. To better capture this local heterogeneity and identify potential counties with higher fast food density, let’s go on with a more granular view of the analysis and shift from state level to county level. Examining the joined dataset at the county level allows for a more granular understanding of how fast food restaurants are distributed and helps reveal patterns that may be more directly relevant to county level health problems like diabetes and obesity and quality food access that can tie in with the overarching question.
Similar to the state level analysis, let’s take a look at the choropleth map at the county level to see any patterns.
#State look up
state_lookup <- states(year = 2019) |>
st_drop_geometry() |>
select(STATEFP, STUSPS, NAME) |>
rename(
State_Abbr = STUSPS,
State_Name = NAME
)
# Aggregate tract to county
den_fastfood_county <- fastfood_tract |>
mutate(county_fips20 = substr(tract_fips20, 1, 5)) |>
group_by(county_fips20) |>
summarize(
total_fastfood = sum(count_fastfood, na.rm = TRUE),
total_pop = sum(acs_pop, na.rm = TRUE),
den_fastfood = (total_fastfood / total_pop) * 1000,
.groups = "drop"
) |>
filter(total_pop > 0, !is.na(den_fastfood))
# Load shapefile and join
county_shapes <- counties(year = 2019, cb = TRUE) |>
mutate(GEOID = as.character(GEOID))
fastfood_map_county <- county_shapes |>
left_join(den_fastfood_county, by = c("GEOID" = "county_fips20")) |>
left_join(state_lookup, by = "STATEFP") |>
mutate(
den_fastfood_plot = ifelse(den_fastfood == 0, NA, den_fastfood)
)At the county level, the heatmap shows wide variation in fast food density within states. Counties shown in gray have zero fast food restaurants, while darker shades indicate higher fast food density. Several darker clusters appear in parts of Virginia, Mississippi, and Colorado, showing that some counties experience much higher fast food concentration than surrounding areas. Overall, the map highlights how food access can vary dramatically from one county to another, reinforcing the importance of looking at local, county patterns when analyzing and food deserts.
ggplot(fastfood_map_county) +
geom_sf(aes(fill = den_fastfood), color = NA, linewidth = 0.1) +
scale_fill_gradientn(
colours = c("grey90", "#e792e7", "#5b7cff", "#0432FF"),
name = "Fast Food\nDensity",
limits = c(NA, NA),
na.value = "#ffffff",
trans = "sqrt"
) +
coord_sf(
xlim = c(-125, -66),
ylim = c(24, 50),
expand = FALSE
) +
labs(
title = "Fast Food Density by County",
caption = "Source: NaNDA Eating and Drinking Places by Census Tract & ZCTA, 1990–2021\nU.S. Census Bureau ACS B01003 (2019)"
) +
theme_void() +
theme(
legend.position = "right",
plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
plot.margin = margin(5, 5, 5, 5),
plot.caption = element_text(size = 10, face = "italic", hjust = 0)
)The distribution of fast food density across U.S. counties is highly right skewed, indicating that most counties have relatively low fast food restaurants density. The median county has approximately 0.17 fast food restaurants per 1,000 people, reflecting the influence of a small number of high density counties. The first quartile is at zero, meaning that at least 25% of counties have no fast food restaurants recorded or extremely low density. In contrast, the maximum density reaches 1.82, highlighting the presence of extreme outliers. The histogram also shows that a majority of counties have zero fast food restaurants.
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.0000 0.0000 0.1654 0.1726 0.2552 1.8240 69
fastfood_map_county |>
st_drop_geometry() |>
ggplot(aes(x = den_fastfood)) +
geom_histogram(
bins = 30,
fill = "#9badff",
color = "white",
linewidth = 0.3
) +
geom_vline(
xintercept = median(fastfood_map_county$den_fastfood, na.rm = TRUE),
linetype = "dashed",
linewidth = 0.8,
color = "#0432FF"
) +
coord_cartesian() +
labs(
title = "Distribution of Fast Food Density per County",
caption = "Source: NaNDA Eating and Drinking Places by Census Tract & ZCTA, 1990–2021\nU.S. Census Bureau ACS B01003 (2019)",
x = "Fast Food Density",
y = "# of Counties"
) +
theme(
axis.line = element_line(color = "black", linewidth = 0.4),
plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
plot.margin = margin(5, 5, 5, 5),
plot.caption = element_text(size = 10, face = "italic", hjust = 0),
panel.background = element_rect(fill = "white")
)The counties with the highest fast food density are overwhelmingly small-population areas, where even a handful of fast food restaurants results in a high per-capita value. Teller County, Colorado ranks highest with just four fast food establishments serving a population of roughly 2,200, producing a density of 1.82 restaurants per 1,000 people. These counties are spread across a few states most notably Virginia, Mississippi, Georgia, and Colorado, noting that these are states in the South and the West.
# Create datatable
datatable(
fastfood_map_county |>
st_drop_geometry() |>
arrange(desc(den_fastfood)) |>
slice_max(den_fastfood, n = 10) |>
select(
County = NAME,
State = State_Name,
`Fast Food Count` = total_fastfood,
`Total Population` = total_pop,
`Fast Food Density` = den_fastfood
),
caption = "Top 10 Counties with the Highest Fast Food Density",
rownames = FALSE,
options = list(
pageLength = 10,
scrollX = TRUE
)
) |>
formatRound("Fast Food Count", 0, mark = ",") |>
formatRound("Total Population", 0, mark = ",") |>
formatRound("Fast Food Density", 2)This scatter plot shows the relationship between county population size and fast food restaurant density. Counties with smaller populations tend to exhibit higher fast food density, while more populous counties cluster at lower density levels. This pattern suggests that higher fast food density in some counties may reflect limited food options overall, rather than an overabundance of fast food restaurants alone.
fastfood_map_county |>
st_drop_geometry() |>
ggplot(aes(x = total_pop, y = den_fastfood)) +
geom_point(
alpha = 0.15,
size = 1,
color = "#0432FF"
) +
geom_smooth(
method = "lm",
se = FALSE,
linewidth = 1,
color = "#e792e7"
) +
scale_x_continuous(labels = scales::comma) +
coord_cartesian(ylim = c(0, 1)) +
labs(
title = "Fast Food Density vs. County Population",
x = "County Population",
y = "Fast Food Restaurants per 1,000 People",
caption = "Source: NaNDA Eating and Drinking Places by Census Tract & ZCTA (1990–2021)\nU.S. Census Bureau ACS B01003 (2019)"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
plot.margin = margin(5, 5, 5, 5),
plot.caption = element_text(size = 10, face = "italic", hjust = 0)
)In this project, the Green Apple team set out to explore fast food availability across the United States as part of the broader question: are food deserts in the U.S. linked to health problems like diabetes? By using fast food density at both the state and county levels, we found that fast food availability is highly uneven across the country and varies substantially within the states. While state level patterns highlighted broad regional trends, county-level analysis revealed localized pockets of high fast food density that would be hidden by state averages alone.
At the county level, fast food density was strongly right skewed, with most counties exhibiting low density and a small number showing very high values. These high density counties were often rural and sparsely populated, where even a few fast food restaurants resulted in elevated per capita values. The highest ranked counties further shows that high fast food density does not necessarily indicate an abundance of fast food options, but may instead reflect limited access to alternative food outlets.
The population versus density analysis helped tie these findings together by showing that counties with smaller populations tend to have higher fast food density, while larger counties cluster at much lower levels. This pattern suggests that elevated fast food exposure in some areas may be more indicative of limited food choice overall rather than an oversupply of fast food alone. From a food deserts perspective, this supports the idea that certain communities may face constraints in accessing diverse and nutritious food options.
Although this analysis does not directly test health outcomes such as diabetes, it provides important context for understanding how food environments are distributed across communities. By identifying where fast food exposure is concentrated and how population structure shapes per capita measures, this work lays the groundwork for future analyses that directly link food access, food deserts, and diet-related health outcomes.
In the overarching question report, the team assess whether neighborhood food environments help explain variation in diabetes, we estimated a linear regression model of tract-level diabetes prevalence using median household income, fast food restaurant density, and a low-access food desert indicator as predictors. To see the results, see here.